NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

GPVK-VL: Geometry-Preserving Virtual Keyframes for Visual Localization under Large Viewpoint Changes

https://doi.org/10.1109/CVPR52734.2025.01559

Li, Yunxuan; Fan, Lei; Xing, Xiaoying; Zhou, Jianxiong; Wu, Ying (June 2025, IEEE)

Visual localization, the task of determining the position and orientation of a camera, typically involves three core components: offline construction of a keyframe database, efficient online keyframes retrieval, and robust local feature matching. However, significant challenges arise when there are large viewpoint disparities between the query view and the database, such as attempting localization in a corridor previously build from an opposing direction. Intuitively, this issue can be addressed by synthesizing a set of virtual keyframes that cover all viewpoints. However, existing methods for synthesizing novel views to assist localization often fail to ensure geometric accuracy under large viewpoint changes. In this paper, we introduce a confidence-aware geometric prior into 2D Gaussian splatting to ensure the geometric accuracy of the scene. Then we can render novel views through the mesh with clear structures and accurate geometry, even under significant viewpoint changes, enabling the synthesis of a comprehensive set of virtual keyframes. Incorporating this geometry-preserving virtual keyframe database into the localization pipeline significantly enhances the robustness of visual localization.
more » « less
Free, publicly-accessible full text available June 10, 2026
Active Open-Vocabulary Recognition: Let Intelligent Moving Mitigate CLIP Limitations

Fan, Lei; Zhou, Jianxiong Zhou; Xing, Xiaoying; Wu, Ying (June 2024, The IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR))

Full Text Available
Temporal Feature Enhancement Dilated Convolution Network for Weakly-supervised Temporal Action Localization

https://doi.org/10.1109/WACV56688.2023.00597

Zhou, Jianxiong; Wu, Ying (January 2023, IEEE/CVF Winter Conference on Applications of Computer Vision (WACV))

Weakly-supervised Temporal Action Localization (WTAL) aims to classify and localize action instances in untrimmed videos with only video-level labels. Existing methods typically use snippet-level RGB and optical flow features extracted from pre-trained extractors directly. Because of two limitations: the short temporal span of snippets and the inappropriate initial features, these WTAL methods suffer from the lack of effective use of temporal information and have limited performance. In this paper, we propose the Temporal Feature Enhancement Dilated Convolution Network (TFE-DCN) to address these two limitations. The proposed TFE-DCN has an enlarged receptive field that covers a long temporal span to observe the full dynamics of action instances, which makes it powerful to capture temporal dependencies between snippets. Furthermore, we propose the Modality Enhancement Module that can enhance RGB features with the help of enhanced optical flow features, making the overall features appropriate for the WTAL task. Experiments conducted on THUMOS’14 and ActivityNet v1.3 datasets show that our proposed approach far outperforms state-of-the-art WTAL methods.
more » « less
Full Text Available

Search for: All records